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COMMENTS ON "A NOTE ON OPTIMAL DETECTION 
OF A CHANGE IN DISTRIBUTION," BY BENJAMIN YAKIR 1 

By Yajun Mei 

Georgia Institute of Technology 

The purpose of this note is to show that in a widely cited paper 
by Yakir [Ann. Statist. 25 (1997) 2117-2126], the proof that the so- 
called modified Shiryayev-Roberts procedure is exactly optimal is 
incorrect. We also clarify the issues involved by both mathematical 
arguments and a simulation study. The correctness of the theorem 
remains in doubt. 

1. Introduction. In the change-point literature, as in sequential analysis 
more generally, theorems establishing exact optimality of statistical pro- 
cedures are quite rare. Moustakides [2] and Ritov [5] showed that for the 
simplest problem where both the pre-change distribution Jq and the post- 
change distribution f\ are fully specified, Page's cumulative sum (CUSUM) 
procedure [3] is exactly optimal in the sense of minimizing the so-called 
"worst case" detection delay subject to a specified frequency of false alarms. 
Earlier, Lorden [1] showed this optimality property holds asymptotically. 
Besides Page's CUSUM procedure and its generalizations, the most com- 
monly used and studied approach to define change-point procedures is that 
of Shiryayev [7] and Roberts [6]. Yakir [8] published a proof that claims when 
both /o and f± are fully specified, a modification of the Shiryayev-Roberts 
procedure is exactly optimal with respect to a slightly different measure of 
quickness of detection. In this note we show that Yakir's proof is wrong. It 
is still an open problem whether the modified Shiryayev-Roberts procedure 
is in fact optimal, although its asymptotic optimality was proved in [4]. 

2. Notation. In this note we use the notation of Yakir [8]. However, there 
is one ambiguity between Pfc(-) and P(-|^ = k) in [8]. The change-point v is 
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an unknown constant under the non-Bayesian formulation, but it is a ran- 
dom variable in the auxiliary Bayes problem B(G,p,c). To avoid confusion, 
in this note we denote by v the change-point only in the Bayes problem 
B(G,p,c), and for 1 < k < oo, we denote by the probability measure 
(with change time k) when the observations X\,X2, . . . are independent 
such that Xi, . . . ,Xk-\ have density /o and X^, X^ + i, . . . have density f\. 
In other words, we use Pfc(-) in the context of the non-Bayesian formulation, 
while P(-|^ = k) is used in the context of the Bayes problem B(G,p,c). A 
critical mistake was made in the proof presented by Yakir [8] because of the 
confusion between Pfc(-) and P(-|^ = k), especially when k = 1. 

The modified Shiryayev-Roberts procedure, proposed in [4], is defined by 

(1) N* A = mf{n>0:R* n >A}, 

where 



R n — (1 + Ki- 



and Rq £ [0, oo) has a distribution chosen by the statistician. 

For the right distribution of Rq, the asymptotic optimality of N A was 
proved in [4]. Later Yakir [8] claimed that N\ is exactly optimal in the 
sense of minimizing the "average" detection delay 

(2) V(N)= sup E fc (JV-fc + l|JV>jfe-l) 

l<fc<oo 

among all stopping times N satisfying EooiV > 'E OQ N A . In this note, we 
explain what is wrong with Yakir's proof. 



3. Theoretical results. In order to prove optimality properties of for 
the right distribution of Rq , Pollak [4] and Yakir [8] considered the following 
extended Bayes problem B(G,p, c). Let G be a distribution over the interval 
[0,1]. Suppose < p < 1. Assume that a random variable ttq is sampled 
from the distribution G before taking any observations. Given the observed 
value of ttq, suppose the prior distribution of the change-point v is given by 
P(z/ = 1) = 7T and P(^ = n) = (1 - 7r )p(l - p) n ~ 2 for n > 2. Consider the 
problem of minimizing the risk 

1Z(N) = P(N < v - 1) + cE(N - u + 1)+, 

where c > can be thought of as the cost per observation of sampling after 
a change. It is well known [7] that the Bayes solution of this extended Bayes 
problem B(G,p,c) is of the form 



M GiPiC = inf{n>0:i?* in >^}, 
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where q = 1 — p, and 

where ttq has a distribution 67. Yakir [8] showed that for some sequence of 
p — > 0, there exists a sequence of G = G p and c = c p such that c — > c* and 
7i"o/p — > + 1 in distribution, and so defined in (1) is a limit of Bayes 
solutions Mg,p, c - Yakir [8] claimed that the Bayes solution Mc,p, c satisfies 
(see lines 11-12 on page 2123) 

(3) lim 1 - U ^ Mg ^ = (i _ c *EiJV^)[EiZg + 1 + E^A^]. 

p— >0 p 

The proof of the exact optimality of the modified Shiryayev-Roberts pro- 
cedure in [8] is based on this equation. However, the next theorem shows 
that equation (3) does not hold in general. 

Theorem 1. 

lim 1 - n ^ M G,P,c) = [BR * + 1 + EoqN1] 
P^O p 

(4) 

- c*\Ei(R%NZ) + (EiJV^)(l + EooiVl)]. 

Proof. For the extended Bayes problem B(G,p,c), any stopping rule 
N satisfies 

l-^(iV) = P(iV>,-l) y+ _ 

p p 

Yakir [8] correctly showed that 

(6) lim P(iV - i '~ 1) = BR* + 1 + E^. 

p^o p 

Arguing as in Lemma 13 of [4], we have 

lim E(M G , P ,c ~v + l\M G , P ,c > v - 1) 
p— >o 

(7) 

= El N* A + lim E(iV> = 1), 

A E^ + l + EooiVl E^ + 1 + Eoo^p-o v Al ; ' 

and the limiting distribution of i?Q conditional on {v = 1} has the density 

(z + 1) d(j) (x) _ (x + 1) d(f> (x) 
J(x + l)dMx)~ BR* + 1 ' 

where </>o(£) is the unconditional distribution of Rq. Yakir [8] made a critical 
mistake by thinking that the limiting distribution of Rq conditional on {v = 
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1} is just 4>o(x). Since Rq > 0, the stopping times are dominated by 
the Shiryayev-Roberts stopping time with Rq = 0. Thus, by the dominated 
convergence theorem, 



(8) 



lim l E(iV> = 1) = lim yE(E(N%\R* , v = l)\u = 1) 
p— >o p— »o 

_E 1 (iVl( J R* + l)) 



Ei?* + 1 

Theorem 1 follows at once from (5)-(8) and the fact that c = c(p) — > c* as 
p^O. □ 

A comparison of equations (3) and (4) shows that the major problem in 
Yakir's proof comes from the fact that the term Ei(-RgiV^) is missing. To 
further demonstrate this, as suggested by one referee, let us consider 

1 - K{N) 



(9) 



C(N) = lim 



for a given stopping time N. Since Mc tP .c are Bayesian solutions, we have 

1 - K(M G;P;C ) 



(10) 



lim 

p^O 



P 



>C{N) 



for any given stopping time N. Yakir [8] used inequality (10) and equation 
(3) to prove the exact optimality of N^. In the following, we illustrate why 
Yakir's proof fails. 
Note that 

1 - K(N) = P(JV > v - 1) - cE(iV - v + 1)+ 



E wo J2P(v = k)(P(N>k-l\u = k,ir ) 

k=l 

- CE((N - k + 1)+\U = fe,7T )) 



where E^ denotes expectation with respect to ttq. Here it is important to 
point out that P(-|f = A;,7r ) is same as Pfc(-) but P(-|f = k) is different 
from Pfc(-) because the prior distribution of v depends on tto. Since Pfc(iV > 
k — l) = Poo(N >k-l), we have 



1 _ ft(JV) = E, 



J2 P{» = A:)(Poo(iV > k - 1) - cE fc (iV - A; + 1) 



Lfc=l 



= E 



vr (P O o(A r >0)-cE 1 7V) 
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+ (I - TT )pJ2(l ~ P) 

k=2 



k-2 



x (Poo (AT >k-l)- cE k (N — k + 1)+) 



Using the facts that c — > c* and ttq /p — > Rq + 1 in distribution, for any given 
stopping time N > 0, we have 



C(N) = E R * ( J R5 + l)(P oo (iV>0)- C *EiJV) 



+ ^(Poo(iV > k - 1) - c*E fc (iV - fc + 1) + 

fc=2 

where E^j* denotes expectations with respect to Rq. Observe that 

E R * ((R* + l)EiJV) = E(E(2$EiW|125)) + E^ 

= Ei(i?oiV) + EiiV 

because the properties of -Rq are the same under any probability measure 
Pfc since iZg * s chosen by the statistician before taking any observations. It is 
important to point out that Rq and the stopping time N may or may not be 
correlated under Pi, depending on whether the stopping rule of N involves 
Rq. Then, by the facts that Pi(iV > 0) = 1 and E^P^A 7 " > k - 1) = 
Eoo-ZV, we have 

C(N) = ER* + 1- c*Ei(^iV) + EooiV 

oo 

- c E k(N — k+ l\N >k — l)P 00 (A r > k - 1). 
fc=i 

On the one hand, for any stopping time A" > 0, by the definition of T>(N) in 
(2) and the fact that J2kLi ^oo(N >k — l) = E^N + 1, 

(11) C(N) > ER* + 1 - c*Ei(i^iV) + EooiV - ^(^(E^N + 1). 

On the other hand, is a so-called equalizer rule in the context of the non- 
Bayesian formulation, that is, for all k > 1, E fc (A^ — k + 1|A^ > & — 1) = 
E 1 N% = V(N%). Hence, 

C(N%) = ER* + 1- c*Ei(i?*A^) 

+ EooiVl - c*V{N\) (EooiVl + 1), 

which is exactly the right-hand side of (4) in Theorem 1. Also see Lemma 13 
of [4]. Thus, relation (10) is equivalent to stating that C(N\) > C(N) for any 
stopping time N. 



(12) 



G 
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Now let us go back the non-Bayesian problem in which we are interested 
in minimizing the detection delay T>(N) in (2) among all stopping times 
N satisfying EooiV > EooiV^. Assume E^iY^ = B, and let us consider a 
stopping time N which satisfies the false alarm constraint with equality, 
that is, EooiV = B (without loss of generality we can limit ourselves to this 
case). Then from C(N^) > C(N) and relations (11) and (12), we have 

B^NX) + T>(NX)(1 + B)< E!(R* N) + T>(N)(1 + B), 

from which we cannot conclude that 

V(N* A ) < V(N) 

due to the two terms, Ei(R%N%) and Ei (B%N), and the fact that Ei (R%N%) ^ 
Ei(/?g)Ei(A^) since the stopping rule of involves R$. In [8], the above in- 
equality follows immediately because the two terms, Ei(RqN^) and E\(RqN), 
are erroneously missing. 

4. Numerical examples. It is natural to do simulations to confirm that 
Yakir's result (3) fails while our result (4) is correct. However, it is difficult 
to simulate the value of the left-hand side of these two equations. Now based 
on (3), Yakir [8] also showed that 

us) El iY^ <"° +1 »<;7°» , 

p {fj, + 1) + 1 

where 

Po = P(R* >A) and fi = B(R* \R* < A). 

Yakir is correct in deriving (13) as a consequence of (3). Our result (4) 
and the arguments in [8] lead instead to 

(14) E X N% = (/i + 1)(1 -po) - PoEi (RZN%). 

Therefore, in order to confirm the incorrectness of Yakir's proof and the 
existence of the term F,i(RqN^), it suffices to show that (13) fails while (14) 
is correct. To illustrate this, we have performed simulations for the following 
example, which is considered by Pollak [4] and Yakir [8]. 

Define fo(x) = exp{— x}l(x > 0) and fi(x) = 2exp{— 2x}l(x > 0), and 
pick an A such that < A < 2. As shown in [8], the randomized Rq = (R* + 
1)Z, where (R*,Z) is uniformly distributed on the set [0,^4] x [0,2]. 

It is straightforward to show that fiQ = A/2, and 

Po = P(R* >A) = l- (log(A + l))/2. 

Note that Yakir [8] made a minor mistake here by claiming po = 1 — (log A)/ 2. 

Table 1 compares the theoretical values of Ei given by (13) and (14) to 
Monte Carlo estimates. Our theoretical result (14) was based on Monte Carlo 
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Table 1 
Approximations for EiTVJ 



A 


Monte Carlo 


Our result (14) 


Yakir's result (13) 


1.5 


0.5799 ± 0.0007 


0.5806 ± 0.0003 


0.4115 


1.6 


0.6194 ± 0.0008 


0.6197 ± 0.0004 


0.4433 


1.7 


0.6589 ± 0.0008 


0.6594 ± 0.0004 


0.4757 


1.8 


0.6993 ± 0.0008 


0.6998 ± 0.0004 


0.5090 


1.9 


0.7417 ± 0.0008 


0.7404 ± 0.0004 


0.5430 


1.98 


0.7739 ± 0.0009 


0.7739 ± 0.0004 


0.5708 



estimates of Ei (RqN^), while Yakir's result (13) was calculated exactly. In 
the Monte Carlo experiment, the number of repetitions was 10 6 and each 
result was recorded as the Monte Carlo estimate ± standard error. 

The results in Table 1 suggest that (14) gives correct values for Ei iv"^ and 
(13) does not. These results support the claim that Yakir's proof of exact 
optimality of the modified Shiryayev-Roberts procedures is flawed. 
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